Report on the TREC-10 Experiment: Distributed Collections and Entrypage Searching

نویسندگان

Jacques Savoy

Yves Rasolofo

چکیده

For our participation in TREC-10, we will focus on the searching distributed collections and also on designing and implementing a new search strategy to find homepages. Presented in the first part of this paper is a new merging strategy based on retrieved list lengths, and in the second part a development of our approach to creating retrieval models able to combine both Web page and URL address information when searching online service locations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Report on the TREC-8 Experiment: Searching on the Web and in Distributed Collections

The Internet paradigm permits information searches to be made across wide-area networks where information is contained in web pages and/or whole document collections such as digital libraries. These new distributed information environments reveal new and challenging problems for the IR community. Consequently, in this TREC experiment we investigated two questions related to information searches...

متن کامل

Report on the TREC-9 Experiment: Link-based Retrieval and Distributed Collections

The web and its search engines have resulted in a new paradigm, generating new challenges for the IR community which are in turn attracting a growing interest from around the world. The decision by NIST to build a new and larger test collection based on web pages represents a very attractive initiative. This motivated us at TREC-9 to support and participate in the creation of this new corpus, t...

متن کامل

Applying Inference Networks to Multiple Collection Searching

The paper describes how to use inference networks to solve two problems in searching multiple collections: collection selection and result merging. The eeectiveness of the approaches is demonstrated with the INQUERY system and 3 gigabyte TREC collections.

متن کامل

Distributed Multisearch and Resource Selection for the TREC Million Query Track

A distributed information retrieval system with resource‐selection and result‐set merging capability was used to search subsets of the GOV2 document corpus for the 2008 TREC Million Query Track. The GOV2 collection was partitioned into host‐name subcollections and distributed to multiple remote machines. The Multisearch demonstration application restricted each search to a fraction of the avail...

متن کامل

Lucene for n-grams using the CLUEWeb Collection

The ARSC team made modifications to the Apache Lucene engine to accommodate " go words, " taken from the Google Gigaword vocabulary of n‐grams. Indexing the Category " B " subset of the ClueWeb collection was accomplished by a divide and conquer method, working across the separate ClueWeb subsets for 1, 2 and 3‐grams. Phrase searching—or imposing an order on query terms—has traditionally been a...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2001

Report on the TREC-10 Experiment: Distributed Collections and Entrypage Searching

نویسندگان

چکیده

منابع مشابه

Report on the TREC-8 Experiment: Searching on the Web and in Distributed Collections

Report on the TREC-9 Experiment: Link-based Retrieval and Distributed Collections

Applying Inference Networks to Multiple Collection Searching

Distributed Multisearch and Resource Selection for the TREC Million Query Track

Lucene for n-grams using the CLUEWeb Collection

عنوان ژورنال:

اشتراک گذاری